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The partial sum of the states of a Markov chain or more generally a Markov source is asymptotically normally dis- 
trihuted under suitable conditions. One of these conditions is that the variance is unbounded. A simple combinatorial 
characterization of Markov sources which satisfy this condition is given in terms of cycles of the underlying graph of 
the Markov chain. Also Markov sources with higher dimensional alphabets are considered. 

Furthermore, the case of an unbounded covariance between two coordinates of the Markov source is combinatorically 
characterized. If the covariance is bounded, then the two coordinates are asymptotically independent. 

The results are illustrated by several examples, like the number of specific blocks in 0-1-sequences and the Hamming 
weight of the width-ui non-adjacent form. 
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1 Introduction 

We investigate the random vector defined as the n-th partial sum of a Markov source over a higher di¬ 
mensional alphabet. Under suitable conditions, this random variable is asymptotically jointly normally 
distributed. Its mean and variance-covariance matrix is linear in the number of summands (cf. [|[ The¬ 
orem 2.22]). On the one hand, these conditions include irreducibility and aperiodicity of the underlying 
graph of the Markov chain, which can be checked easily for a given Markov chain. On the other hand, we 
also have to check that the variance-covariance matrix is regular, which requires technical computations. 
In this article, we give a simple combinatorial characterization of Markov sources whose corresponding 
variance-covariance matrix is singular. 

The covariance between two coordinates of this random vector is also of interest: If it is bounded, then 
these two coordinates are asymptotically independent because of the joint normal distribution. We give a 
combinatorial characterization of this case. 

These characterizations are given in terms of subgraphs of the underlying graph of the Markov chain: 
For the variance-covariance matrix, we only have to consider all cycles. A regular variance-covariance 
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matrix will be proven to be equivalent to the linear independence of certain functions of cycles of the 
underlying graph of the Markov chain. For the characterization of an unbounded covariance, we have 
to consider functional digraphs. This result is proven using an extension of the Matrix-Tree Theorem in 


As Markov sources are closely related to automata and transducers, our results can also be used for 
the asymptotic analysis of sequences which can be computed by transducers. This includes the Hamming 
weight of many syntactically defined digit expansions as performed in [|^ |^. Furthermore, 

occuri'ences of digits or subwords can also be computed by transducers. Their variance (and covariance) 
is analyzed in [0101 Hill]- 

In [181, the variance of the output of a transducer as well as the covariance between the input and the 


output were analyzed. In this article, we consider the more general setting of Markov chains. The proofs 
are similar as those in [p^ , but the results are valid in a broader context and can be formulated more 
clearly. In contrast to [|18[], we allow the input sequence of the transducer to be generated by a Markov 
source. This allows us to model an input sequence for a transducer whose letters do not occur with equal 
probabilities and/or have dependencies between the letters. The precise relation between the setting of 
this article and that of [Q is given in Section |^. 

As an example, we prove that the Hamming weight of the so-called width-w non-adjacent form is 
asymptotically jointly normally distributed for two different values of w > 2. The width-ru non-adjacent 
form is a binary digit expansion with digits in {0, ±1, ±3,..., ±(2*"“^ — 1)} and the syntactical rule 
that at most one of any w adjacent digits is non-zero. This digit expansion exists and is unique for every 
integer (cf. [ pl| , [^). Furthermore, it has minimal Hamming weight among all digit expansions with this 
base and digit set. 

The outline of this article is as follows: In Section^, we define our setting and the types of graphs we use 
to state the combinatorial characterization of independent output sums and singular variance-covariance 
matrices. These characterizations are given in Section || and examples are given in Section^. In Section ||, 
we finally prove the results of Section |^. 


2 Preliminaries 

In this article, a finite Markov chain consists of a finite state space {1,..., M}, a finite set of transitions 
£ between the states, each with a positive transition probability, and a unique[^ initial state 1. We denote 
the transition probability for a transition e by pe- Then we have 

H Fe = l 

e^£ 

e starts in i 

for all states i. Note that for all transitions e S f, we require pe > 0. Further note that there may be 
multiple transitions between two states but always only a finite number of them. This may be useful for 
different outputs later on. 

The transition probabilities induce a probability distribution on the paths of length n starting in the 
initial state 1. Let be a random path of length n according to this model. 


This is no restriction as we can always add an additional state and the transitions starting in this state with probabilities corre¬ 
sponding to the non-degenerate initial distribution. The output functions are then extended by mapping these transitions to 0. 
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Fig. 1: A small example of a transducer. 

All states of the underlying digraph of the Markov chain are assumed to be accessible from the initial 
state. Contracting each strongly connected component of the underlying digraph gives an acyclic digraph, 
the so-called condensation. We assume that this condensation has only one leaf (i.e., one vertex with 
out-degree 0). The strongly connected component corresponding to this leaf is called component. 
We assume that the period (i.e., the greatest common divisor of the lengths of all cycles) of this final 
component is 1. We call such Markov chains connected wd finally aperiodic. 

Additionally we use output functions k: £ ^M.. The corresponding random variable Kn is the sum of 
all values of k along a random path Ar„. We call Kn the output sum of the Markov chain with respect to 
k. We use several output functions fci, ..., km and the corresponding random variables Kn \ . ■ ■, 
simultaneously for one Markov chain. 

Remark 2.1. Usually, one is interested in a function evaluated at the sequence of random states of the 
Markov chain. This is equivalent to this setting with an output function of the transitions: For the one 
direction, the restriction of the output function to the outgoing transitions of one state is constant for every 
state. For the other direction, we use the standard construction of the Markov chain with state space 

{{*, j) I 1 < i,j < M}. 

Thus, our setting can be seen as a Markov source with a finite set of m-dimensional vectors as alphabet. 

We are interested in the joint distribution of the random variables Kn \ ..., For one coordinate, 

we will prove that the expected value of Kn'^ is CiU + 0{1) for constants e^. The variance-covariance 
matrix of Kn \ ..., will turn out to be En + 0(1) for a matrix E. We call E the asymptotic 

variance-covariance matrix and its entries the asymptotic variances and covariances. 

We will combinatorically characterize Markov chains with output functions such that the variance- 
covariance matrix is regular. Furthermore, we give a combinatorial characterization of the case that the 
asymptotic covariance is zero. As this is only influenced by two output functions, we restrict ourselves to 
Kn'^ and in this case. 

Remark 2.2. Markov chains with output functions are closely related to transducers with a probability 
distribution for the input: A transducer is defined to consist of a finite set of states, an initial state, a set 
of final states, an input alphabet, an output alphabet and a finite set of transitions, where a transition starts 
in one state, leads to another state and has an input and an output label from the corresponding alphabets. 
See Chapter 1] for a more formal definition. An example of a transducer is given in Figure |^. We label 
the transitions with “input label | output label”. The initial state is marked by an ingoing arrow starting at 
no other state and the final states are marked by outgoing arrows leading to no other state. 

A Markov chain with one output function can be obtained by a transducer with additional probability 
distributions for the outgoing transitions of each state and by deleting the input labels of the transducer. 

If we have two transducers where only the outputs of the transitions are different, we can choose 
probability distributions for the outgoing transitions of each state. Then we obtain a Markov chain with 
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two output functions. Thus, we can use our results for two output functions (see Examples f.2 and 4.3). 

Remark 2.3. We can additionally have^nal output functions f: R. for each output function 

k and redefine the random variable Kn as the sum of the values of the output function k along a random 
path Xn plus the final output / of the final state of this path. We will see that this does not change the 
main terms of the asymptotic behavior. Thus, the results in Section || are still valid (see also Remark 5.5). 


Remark 2.4. The Parry measure are probabilities Pe such that every path of length n has the same weight 
up to a constant factor (cf. p4| , ^]). If we are interested in probabilities such that every path of length n 
starting in the initial state 1 has exactly the same weight, we have to use the Parry measure with additional 
exit weights: Each path is additionally weighted by these exit weights according to the final state of the 
path (cf. [|^ Lemma 4.1]). 

However, the sum of the weights of all paths of length n is no longer normalized: It differs from 1 by 
an exponentially small error term for n ^ oo. This gives an approximate equidistribution of all paths of 
length n. As we are interested in the asymptotic behavior for n ^ oo, the expected value and the variance 
of the corresponding measurable function Kn can still be defined as usual. 

If we use these exit weights Wg in our setting, the main terms of the asymptotic behavior are not 
changed. Thus, the theorems in Section || are still valid (see also Remark 

These exit weights can also be used to simulate final and non-final states of a transducer by setting the 
weights of non-final states to 0. However, not all exit weights of the final component are allowed to be 
zero. 


Next, we define some subgraphs of the underlying graph of the final component and extend the proba¬ 
bilities and the output functions to these subgraphs. 


Definition 2.5. We define the following types of directed graphs as subgraphs of the final component of 
the Markov chain. 


• A rooted tree is a weakly connected digraph with one vertex which has out-degree 0, while all other 
vertices have out-degree 1. The vertex with out-degree 0 is called the root of the tree. 

• Afunctional digraph is a digraph whose vertices have out-degree 1. Each component of a functional 
digraph consists of a directed cycle and some trees rooted at vertices of the cycle. Eor a functional 
digraph D, let Cd be the set of all cycles of D. 

The probabilities pe can be multiplicatively extended to a weight function for arbitrary subgraphs of 
the Markov chain: Let D be any subgraph of the underlying graph of the Markov chain, then define the 
weight of D by 

Pd=Y[ P^- 

eeD 

Eor a path P of length n, this is exactly the probability P(X„ = P). 

However, the output function k is additively extended to cycles C of the underlying graph of the Markov 
chain by 

k{C) = Y,k{e). 

eec 

This can further be extended to functional digraphs: 
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Definition 2.6. Let Vi and 2?2 be the sets of all spanning subgraphs of the final component of the Markov 
chain Ai which are functional digraphs and have one and two components, respectively. 

For functions g and ft,: f ffi., we define 

^ PD 3{C), 

CgCd 

(P,ft)(2?i)= Y. PD 9{C)h{C), 

ig,h){v,)= Y PD Y H giCiMc,). 

DgT>2 CiGCd C 2 GCD 
C2/C1 

As functions g and ft, we use the output functions fci, ..., km and the constant function 1(e) = 1. 


3 Main Results 


In this section, we present the combinatorial characterization of output functions of Markov chains which 
are asymptotically independent and of Markov chains with output functions with a singular variance- 
covariance matrix. The proofs can be found in Section 

If the underlying directed graph of the Markov chain is j-regular, every transition has probability 1/j, 
we only have two output functions and the first output function fci: 8 —^ {0,1,..., j — 1} is such that the 
restrictions of ki to the outgoing transitions of one state is bijective for every state, then these results are 
stated in [jl^ (see also Remark 2.2). 


The next definition describes a sequence of random variables whose difference from its expected value 
is bounded for all elements. 


Definition 3.1. The output sum Kn of a Markov chain is called quasi-deterministic if there is a constant 
a G K such that 

Kr, = an + 0 ( 1 ) 

holds for all n. 


Next we give the combinatorial characterization of output sums with bounded variance in the case of a 
not necessarily independent identically distributed input sequence. 

Theorem 1. For a finite, finally connected and finally aperiodic Markov chain M. with an output function 
k, the following assertions are equivalent: 


(a) The asymptotic variance v of the output sum is 0. 

(b) There exists a state s of the final component and a constant a G K such that 

k{C)=at{C) 

holds for every closed walk C of the final component visiting the state s exactly once. 

(c) There exists a constant a G K. such that 

k{C)=al{C) 

holds for every directed cycle C of the final component of M. 
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In that case, an 0{1) is the expected value of the output sum and Statement ](b\ holds for all states s 
of the final component. 

If Ai is furthermore strongly connected, the following assertion is also equivalent: 

(d) The random variable is quasi-deterministic with constant a. 

In the case that the value of the output function is 0 or 1 for each transition, there are only two trivial 
output functions with asymptotic variance zero. 

Corollary 3.2. Let k: £ ^ {Oj !}• Then the asymptotic variance v is zero if and only if the output 
function k is constant on the final component. 

The next theorem extends Theorem to the joint distribution of several simultaneous output sums by 
combinatorically describing the case of a singular variance-covariance matrix. 

Theorem 2. Let Ai be a finite, finally connected, finally aperiodic Markov chain with m output functions 
ki, ..., km. Then the variance-covariance matrix S is regular if and only if the functions 1, fci, ..., km 
are linearly independent as functions from the vector space of cycles of the final component to the real 
numbers, i.e. there do not exist real constants ao, ..., am, not all zero, such that 

flol(C') + o,iki{C) + • • • + amkmiC) = 0 (1) 

holds for all cycles (or equivalently, for all closed walks) C of the final component. 

( 1 ) (tti) 

The random variables Kn , .... Kn ^ are asymptotically jointly normally distributed if and only ifT, 
is regular. 

Remark 3.3. Theorems and ^ and Corollary are independent of the choice of the probabilities of 
the transitions. Only the structure of the underlying graph of the Markov chain and the output functions 
influence the result. Note, however, that according to our general assumptions, all transitions have positive 
probability. 

The next theorem gives a combinatorial characterization of output functions of a Markov chain which 
are asymptotically independent. As this characterization is given by the covariance, we can restrict our¬ 
selves to two output functions without loss of generality. 

Theorem 3. Let Ai be a finite, finally connected, finally aperiodic Markov chain with two output functions 
ki and k 2 . 

Then the random variable Kn^ has the expected value eitn -f 0{1) and the variance ViU -f 0(1) where 
the constants are 


_ hjVi) 
l(Vi) ’ 

Vi = , , {{ki - eit,ki - eit){Vi) - {ki - e^t^ki - eil)(T>2)) 

for i = 1,2. 

The covariance of and Kn'^ is cn 0{1) with the constant 

c = -.1 ' ((fci - eil, *2 - e 2 l)(T’i) - (fci - eil,fc 2 - e 2 l)(T’ 2 ))- 

l{Vi) 


( 2 ) 
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Fig. 2: Transducer T {w) to compute the Hamming weight of the width-w non-adjacent form. 

The random variables Kn'^ and Kn'^ are asymptotically independent if and only if 

(fci — eil, ^2 ~ e2l)(22i) = {ki — Cil, k2 — e2^){T>2). 

In the case that the expected values of Kn'^ and Kn'^ are both bounded, i.e. ei = 62 = 0, these random 
variables are asymptotically independent if and only if 

(fci,fc2)(T>i) = {ki,k2){'D2). 


4 Examples 

In this section, we first prove the asymptotic joint normal distribution of the Hamming weights of two 
different digit expansions by using Theorem ^ Then we investigate the independence of length 2 blocks 
of 0-1-sequences by using Theorem ^ In both cases we start with two transducers to construct a Markov 
chain with two output functions, once as a Cartesian product, once via Remark [2.2[ 

Example 4.1 (Width-ru non-adjacent forms). Let 2 < wi < ^2 be integers. We consider the asymp¬ 
totic joint distribution of the Hamming weight of the width-wi non-adjacent form (rui-NAF) and the 
Hamming weight of the r(; 2 -NAF. The width-ru non-adjacent form is a binary digit expansion with digit 
set {0, ±1, ±3,..., ±(2“'“^ — 1)} and the syntactical rule that at most one of any w adjacent digits is 
non-zero. 

It will turn out that this distribution is normal if and only if the variance-covariance matrix is regu¬ 
lar. Using Theorem we have to find closed walks in the corresponding Markov chain such that all 
coefficients in (|p have to be zero. 
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The transducer T(w) in Figure ^ computes the Hamming weight of the tc-NAF of the integer n when 
the input is the binary expansion of n (cf. []I5[]). It has w + 1 states. Next, we construct the Cartesian 
product of the transducers for wi and W 2 and choose any non-degenerate probability distribution, i.e. with 
all probabilities non-zero, for the outgoing transitions of a state. Thus, we obtain a Markov chain Ai with 
{wi + 1) (^2 + 1) States with two different output functions hi and /12 corresponding to the outputs of the 
transducers for wi and W2, respectively. We can now use Theorem ^ to prove that these two Hamming 
weights are asymptotically jointly normally distributed. 

The Cartesian product of two closed walks in T{wi) and T{w 2 ) with the same input sequence is a 
closed walk in M. We construct three different closed walks and prove that all three coefficients in (|^ 
have to be zero. For brevity, we denote a closed walk in the Cartesian product Ai and its projections to 
T{wi) and T{w 2 ) by the same letter. 

First, we choose the closed walk Ci starting in state 1 with input sequence 0. We obtain hi{Ci) = 0 
in T{wi), /i 2 (C'i) = 0 in T{w 2 ) and l(C'i) = 1. Second, we choose the closed walk C 2 starting in 1 
with input sequence 10“'^“^. Because wi < W 2 and the loop at state 1, C 2 is a closed walk in T(wi) 
and T{w 2 )- We obtain hi{C 2 ) = 1 in T{wi), /i 2 (C' 2 ) = 1 in T{w 2 ) and 1 (C' 2 ) = W 2 - The third choice 
depends on whether = u ;2 — 1 or not; 

• wi ^ W 2 — We choose the closed walk starting in 1 with input sequence 

where a = max(u ;2 — 2uii, 0). On the one hand, this is a closed walk in T{wi) consisting of two 
times the cycle 1 —lui —1 and a times the loop at state 1. On the other hand, this is a closed 
walk in T (^ 2 ) consisting of the cycle 1 —W 2 —1 and the correct number of loops at state 1. We 
obtain hi{C^) = 2 in T(ti’i), h 2 {C^) = 1 in T(w 2 ) and 1 (C' 3 ) = max(w 2 , 2 wi). 

• wi = W 2 — ^'- We choose the closed walk Ca starting in 1 with input sequence 

On the one hand, this is a closed walk in T (wi) consisting of three times the cycle 1 —1. On 
the other hand, this is a closed walk in T{W2) consisting of the closed walk 1 ^ W2 ^ 'W2 + 1 ^ 
1^2 —>■ 1 and the correct number of loops at state 1 . We obtain hi{C^) = 3 in T(wi), li 2 (C' 3 ) = 2 
in T(w 2 ) and 1 ( 03 ) = Stui. 

This yields a system of linear equations for the coefficients oq, ai and 02 with coefficient matrix 


/ 1 

0 

o\ 


/1 

0 

o\ 

W2 

1 

1 

or 

W2 

1 

1 

ymax(w2, 2wi ) 

2 

1/ 


\3wi 

3 

2/ 


which only has the trivial solution. Thus, the Hamming weights of the wi-NAF and the u> 2 -NAF are 
asymptotically jointly normally distributed, independently of the choice of the distributions for the Markov 
chain. 

The next two examples investigate the asymptotic independence of length two blocks of 0-1-sequences. 
Example 4.2 (10- and 11-blocks). The two transducers in Figure ^count the number of 10- and 11-blocks 
in 0-1-sequences. After deleting the outputs, both transducers are the same. Thus, any non-degenerate 
probability distribution on the outgoing edges of the states gives a Markov chain with two output functions 
fcio (for the 10-blocks) and kn (for the 11-blocks). 

Because of the two loops and the cycle 0 —1 —0, Theorem || implies that the number of 10- and 
11 -blocks is asymptotically normally distributed. 
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1 I 0 0 I 0 



1 I 1 0 I 0 



Fig. 3: Transducers to compute the number of 10- and 11-blocks. 



(B) V2 


Fig. 4: Functional digraphs of the transducers of Examples |4.2| and 4.3 
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The next question is: For which choices of probability distributions is the number of 10- and 11-blocks 
asymptotically independent? All functional digraphs with one or two components are given in Figure 
Using Theorem 1^, we obtain the following system of equations for the values of the probabilities such that 
the numbers of 11-blocks and 10-blocks are asymptotically independent: first by definition 

1 = Po->o + 

1 = Pi^o +Pl^l, 


then by 


__ Fo-»iFi->o _ 

Po^iPi^i + 2po^ipi^o + po^opi^o ’ 

Po^iFi^i + 2po^iPi^o + Fo^-oFi^o 

and finally for the independence 

Fo-s-iFi-S'i(“eio)(l — eii) -f po^iPi^o(l ~ 2eio)(—Sen) + Fo-i>oPi^o(~eio)(—en) 

= Po^oPi^i(—eio)(—eii) -I-po^oFi^i(—eio)(l — en). 

This system has non-trivial real solutions, i.e. solutions where all probabilities are non-zero, with 

Fo->o = — 2 ^ 1 ^! + 2 — - pI^i — 8pi-i.i + 8 

for all 0 < < 1. Then we have 2 — \/2 < po->o < 1- 

Thus, for these transition probabilities, the number of 10-blocks and the number of 11-blocks are 
asymptotically independent. 

One such example of a non-trivial solution is = pi^o = 0.5, po_s.o ~ 0.7192 and po_>.i ~ 

0.2808. Note that for the symmetric distributions po^o = Po^-i = Pi-s-i = Pi^o = 0-5, we obtain 
asymptotic dependence of the number of 10- and 11-blocks. 

Example 4.3 (00- and 11-blocks). The two transducers in Figure ||count the number of 00- and 11-blocks 
in 0-1-sequences. They have the same underlying graph and the same input labels. Thus, choosing any 
non-degenerate probability distribution of the outgoing edges of the states yields a Markov chain with two 
output functions. 

Because of the two loops and the cycle 0 —1 —0, Theorem H implies that the number of 00- and 
11-blocks is asymptotically normally distributed. 

The next question is: For which choices of probability distributions is the number of 00- and 11- 
blocks asymptotically independent? The functional digraphs of the final component are the same as in 
Example |4.2[ , see again Figure By Theorem |^, the system of equations for the transition probabilities 
Pe such that the two output functions are asymptotically independent are: first by definition 

1 = Po^o +Po^i, 

1 = Pi^o +Pl^l, 
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0 | 1 


0 I 0 


(A) 00-blocks 


(B) 11-blocks 


Fig. 5: Transducers to compute the number of 00- and 11-blocks. 


then by 

__ Po-»oPi->o _ 

Po^iPi^i + 2po^iPi^o + Po^oPi^o ’ 

Po^iPi^i + 2po^iPi^o + Po^oPi^o 

and finally for the independence 

Po-i.iPi-i.i(—eoo)(l — eii) + po^iPi-s-o(—2eoo)(—2eii) -f po->oPi-i.o(l — eoo)(—en) 

= Po->'OPi-s-i(l — eoo)(l — eii) +Po^oPi-i-i(—eoo)(—en). 

These equations have no solution with 0 < pe < 1 for all transitions e. Thus, the numbers of 00- and 
11 -blocks are asymptotically dependent for all choices of the input distributions, as expected. 

5 Proofs 

In this section, we prove the results from Section Most of the proofs follow along the same ideas as in 
[jl^]. The main differences are that one has to replace “complete transducer” by “Markov chain” and the 
input sum by the output sum Kn\ 

We first prove Theorem || with the help of two lemmas. For one of these lemmas, we use a version of 
the Matrix-Tree Theorem for weighted directed forests proved in [^, At the end of this section, we 
prove Theorems and |] 

Definition 5.1. Let A, S L {1,..., N}. Let J-a,b be the set of all forests which are spanning subgraphs 
of the final component of the Markov chain A4 with | A| trees such that every tree is rooted at some vertex 
a C A and contains exactly one vertex b G B. 
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Let A = {ii,... ,in} and B = {ji ,..., j„} with ii < ■■■ < and ji < • • • < For F e Fa,b, 
we define a function g: B ^ Ahy g{j) = i if j is in the tree of F which is rooted in vertex i. We further 
define the function h: A ^ B hy h{ik) = jk for k = 1,.. .,n. The composition g o h: A ^ Ais a 
permutation of A. We define sign F = sign g o h. 

If 1^1 ^ \B\, then Fa.b = 0- If |^| = \B\ = 1, then signF = 1 and Fa,b consists of all spanning 
trees rooted in a G A. 

Theorem (All-Minors-Matrix-Tree Theorem [^j, For a directed, weighted graph with loops and 

multiple edges, let L = {lij)i<ij<N be the Laplacian matrix, thatis'^^^^ lij = 0 for every i = 1,... ,7V 
and —lij is the sum of the weights Pe of all edges efrom i to j for i ^ j. Then, for |g1| = \B\, the minor 
det La.b satisfies 



det = (— PFsignT" 

where La,b fhe matrix L whose rows with index in A and columns with index in B are deleted. 


The All-Minors-Matrix-Tree Theorem is still valid for |A| ^ \B\ if we assume that the determinant of 
a non-square matrix is 0. For notational simplicity, we use this convention in the rest of this section. 

Definition 5.2. The transition matrix W{xi,... ,Xm) of a Markov chain with M states and m output 
functions ki, ..., km a M x M matrix whose (z, j)-th entry is 



e: i^j 


where pe is the probability of the transition e. 

Let A{xi, ..., Xm) be the N x N transition matrix of the final component of the Markov chain. Let 
the order of the states be such that the transition matrix of the whole Markov chain W{xi ,..., Xm) has 
the block structure 



(3) 


where * denotes any matrix. If the Markov chain is strongly connected, the matrices * are not present 
(they have 0 rows). 

We first use the All-Minors-Matrix-Tree Theorem to connect the derivatives of the characteristic poly¬ 
nomial of the transition matrix with a sum of weighted digraphs in the next lemma. 

Lemma 5.3. For f{xi,X 2 , z) = det(/ — zA{xi, X 2 )), we have 



for i = 1,2. 

This lemma can be proven in the same way as 


[p^ Lemma 5.3] using the All-Minors-Matrix-Tree 


Theorem [|[ ^ . 


The following lemma will be used for m >2 output functions later on. 
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Lemma 5.4. Let f{xi ,..., Xm, z) = det(/ — zA(xi ,..., Xm))- Then there is a unique dominant root 
z = p{xi,... ,Xm) of f in a neighborhood of , 1). 

The moment generating function of {Kn \ ..., has the asymptotic expansion 

E(exp(sii^^i) + • • • + + 0 (k")) 


where n < 1, 




and t)(si,..., Sm) are analytic functions in a small neighborhood o/(0,..., 0). 


Proof: The moment generating function of (iT, 


( 1 ) 


1 


is 


E(exp(siii:^^) H - VSmKlf^'^)) = , e®™)) ^t;2(e®S ..., e®") 


for the initial vector vi, and a vector t; 2 (a:i,..., Xm) encoding all the hnal information of the state sp^ 
where we write [z'^]b{z) for the coefficient of z” in the power series b. Because of the block structure of 
the transition matrix W of the whole Markov chain in (|^, we obtain 



_ r ni Fi{xi,... ,x^,z) 

^ UetiI-zWix„...,x^)) 

^ r _ Fi{xi,...,Xm,z) _ 

F 2 (a^i, . . . , Xm j Z^ f (^Xi , . . . , Xm 5 -2^) 


for “polynomials” Fi and F 2 , i.e. hnite linear combinations of x“^ ■ ■ ■ x^z^ for S K and /3 a non¬ 
negative integer. The function F 2 corresponds to the determinant of the non-hnal part of the Markov 
chain. 

We obtain the coefficient of z” by singularity analysis (cf. [p|]): Since the final component of M 
is again a Markov chain, the dominant singularity of 1//(1,..., 1, z) is 1 by the theorem of Perron- 
Frobenius (cf. 0). By the aperiodicity of the final component, this dominant singularity is unique and it 
isp(l,...,l) = 1. 

Next, we consider the non-hnal components of the Markov chain using the same arguments as in [p^. 
The corresponding non-hnal component JVIq is not a Markov chain as the transition matrix is not stochas¬ 
tic. Let A4q he the Markov chain that is obtained from M^hy adding loops with the missing probabilities 
where necessary. The dominant eigenvalue of the transition matrix of Mq is 1. As the transition matrices 
of Ado and J\4^ satisfy element-wise inequalities but are not equal (at {xi,..., Xm) = (1, ■ • ■ > 1)). the 
theorem of Perron-Frobenius (cf. 0 Theorem 8.8.1]) implies that the dominant eigenvalues of Ado have 
absolute value less than 1. Thus, the dominant singularities of ^ 2 ( 1 ,..., 1, z)~^ are at |z| > 1. 

As A(l,..., 1, z) = (1 — z)~^, we obtain Fi(l,..., 1) ^ 0. 

Thus, there is a is the unique, dominant singularity of 


This information is the final output (see Remai'k |2.3| ) and the exit weight (see Remai'kp^ included as in 

the 2 -th coordinate of 2 ; 2 (a:i,..., Xm)- This does not change the asymptotic behavior (see Remark p.q). 
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which is p(l,..., 1) = 1. This also holds for (xi,..., Xm) in a small neighborhood of (1,..., 1) by 
the continuity of the eigenvalues of the transition matrices. Thus, p{xi ,..., Xm) is this unique dominant 
singularity. 

Now, singularity analysis (cf. [Q]) implies the statement of this lemma. □ 


Remark 5.5. The main term of the asymptotic expansion of the moment generating function only de¬ 
pends on p{xi, ..., Xra) and therefore on f{xi, ... , Xm, z). It does not depend on the “polynomials” 
Fi {xi ,..., Xm, z) and F2 {xi, ■ ■ ■, Xm, z). Thus, only the final component influences the main term. Nei¬ 
ther the states in the non-final part of the Markov chain nor the final outputs and exit weights influence 
the main term. 

Now, we can use the previous two lemmas to prove Theorem ||. 


Proof of Theorem By Lemma ^^for two output functions ki and k 2 , the moment generating function 
satisfies the conditions of the Quasi-Power Theorem []I^ Theorem 5.1], which yields the expected value 


=n grad ?r(0)+0(1) 


and the variance 

with gradu(O) and Hu{0) the gradient and the Hessian of u at 0, respectively. Furthermore, we obtain 
an asymptotic joint normal distribution of the standardized random vector if the Hessian is not singular 
by [ jlSj Theorem 3.9]. Otherwise, the limiting random vector is either a pair of degenerate random vari¬ 
ables, or a degenerate and normally distributed one, or a linear transformation thereof. Thus, the random 
variables Kn'^ and are asymptotically independent if and only if the covariance is zero. 

By implicit differentiation, we obtain the following formulas for the constants of the moments in terms 
of the partial derivatives of /: 


e, = 


fxj 

fz 1’ 


= -Jsifxiifzz + fz) + fUfxiX, + fxi) - 


C = -^{fxifx2{fzz + fz) + f'^fx2X2 - fx^fzfx^Z “ fx^fzfx^z) 


for i = 1 , 2 . 

Now, Lemma |5.3| implies the results as stated in the theorem. □ 

Proof of Theorem ||: This follows by the same arguments as in [ p^ Theorem 3.1]. □ 

Proof of Corollary This follows by the same arguments as in [ jl^ Corollary 3.6]. □ 

Proof of Theorem ^ WLOG, we assume that = 0(1) for i = 1,... ,m by subtracting the 

corresponding constant of the expected value from each output function. There exists a unitary matrix 
^ such that the variance-covariance matrix E can be diagonalized as TET^ = D. The 
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diagonal matrix D is the variance-covariance matrix of the linearly transformed random vector Yn = 

TK^. 

Then E is singular if and only if the diagonal matrix D is singular. This is equivalent to 




(4) 


holds for a j G {1,..., m}. Now consider the output function tjiki + • • • + tj^km- By Theorem ^ (Q) 
is equivalent to 

tjikiiC) + • • • -f tjmkm{C) = 0 


holding for all cycles of the hnal component (since the expected value of this output function is 0(1)). 

If we shift back the output function such that the expected value is no longer bounded, we obtain an 
additional summand 001 ( 0 ). 

The asymptotic joint normal distribution follows from Lemma 5.4 and the multidimensional Quasi- 
Power Theorem [13, Theorem 2.22]. □ 
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